{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# LABXX: What-if Tool: Model Interpretability Using Mortgage Data \n",
    "\n",
    "**Learning Objectives**\n",
    "\n",
    "1. Create a What-if Tool visualization\n",
    "2. What-if Tool exploration using the XGBoost Model\n",
    " \n",
    " \n",
    "## Introduction \n",
    "\n",
    "This notebook shows how to use the [What-if Tool (WIT)](https://pair-code.github.io/what-if-tool/) on a deployed [Cloud AI Platform](https://cloud.google.com/ai-platform/) model. The What-If Tool provides an easy-to-use interface for expanding understanding of black-box classification and regression ML models. With the plugin, you can perform inference on a large set of examples and immediately visualize the results in a variety of ways. Additionally, examples can be edited manually or programmatically and re-run through the model in order to see the results of the changes. It contains tooling for investigating model performance and fairness over subsets of a dataset.  The purpose of the tool is to give people a simple, intuitive, and powerful way to explore and investigate trained ML models through a visual interface with absolutely no code required.\n",
    "\n",
    "[Extreme Gradient Boosting (XGBoost)](https://xgboost.ai/) is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework. In prediction problems involving unstructured data (images, text, etc.) artificial neural networks tend to outperform all other algorithms or frameworks. However, when it comes to small-to-medium structured/tabular data, decision tree based algorithms are considered best-in-class right now. Please see the chart below for the evolution of tree-based algorithms over the years.\n",
    "\n",
    "*You don't need your own cloud project* to run this notebook. \n",
    "\n",
    "** UPDATE LINK BEFORE PRODUCTION **:  Each learning objective will correspond to a __#TODO__ in the [student lab notebook](https://github.com/GoogleCloudPlatform/training-data-analyst/blob/gwendolyn-dev/courses/machine_learning/deepdive2/ml_on_gc/what_if_mortgage.ipynb)) -- try to complete that notebook first before reviewing this solution notebook."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set up environment variables and load necessary libraries \n",
    "We will start by importing the necessary libraries for this lab."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "Y93EHw56Vtid"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Python Version:  3\n"
     ]
    }
   ],
   "source": [
    "import sys\n",
    "python_version = sys.version_info[0]\n",
    "print(\"Python Version: \", python_version)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip3 install witwidget"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "CosDxuLy7M4Q"
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import witwidget\n",
    "\n",
    "from witwidget.notebook.visualization import WitWidget, WitConfigBuilder"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "bFIxtguO1In_"
   },
   "source": [
    "## Loading the mortgage test dataset\n",
    "\n",
    "The model we'll be exploring here is a binary classification model built with XGBoost and trained on a [mortgage dataset](https://www.ffiec.gov/hmda/hmdaflat.htm). It predicts whether or not a mortgage application will be approved. In this section we'll:\n",
    "\n",
    "* Download some test data from Cloud Storage and load it into a numpy array + Pandas DataFrame\n",
    "* Preview the features for our model in Pandas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "9BngZjdsO6Mr"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Copying gs://mortgage_dataset_files/data.pkl...\n",
      "| [1 files][104.0 MiB/104.0 MiB]                                                \n",
      "Operation completed over 1 objects/104.0 MiB.                                    \n",
      "Copying gs://mortgage_dataset_files/x_test.npy...\n",
      "/ [1 files][172.0 KiB/172.0 KiB]                                                \n",
      "Operation completed over 1 objects/172.0 KiB.                                    \n",
      "Copying gs://mortgage_dataset_files/y_test.npy...\n",
      "/ [1 files][  628.0 B/  628.0 B]                                                \n",
      "Operation completed over 1 objects/628.0 B.                                      \n"
     ]
    }
   ],
   "source": [
    "# Download our Pandas dataframe and our test features and labels\n",
    "!gcloud storage cp gs://mortgage_dataset_files/data.pkl .\n",    "!gcloud storage cp gs://mortgage_dataset_files/x_test.npy .\n",    "!gcloud storage cp gs://mortgage_dataset_files/y_test.npy ."   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preview the Features \n",
    "\n",
    "Preview the features from our model as a pandas DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "GkHavVlmGYlk"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>as_of_year</th>\n",
       "      <th>occupancy</th>\n",
       "      <th>loan_amt_thousands</th>\n",
       "      <th>county_code</th>\n",
       "      <th>applicant_income_thousands</th>\n",
       "      <th>population</th>\n",
       "      <th>ffiec_median_fam_income</th>\n",
       "      <th>tract_to_msa_income_pct</th>\n",
       "      <th>num_owner_occupied_units</th>\n",
       "      <th>num_1_to_4_family_units</th>\n",
       "      <th>...</th>\n",
       "      <th>purchaser_type_Life insurance company, credit union, mortgage bank, or finance company</th>\n",
       "      <th>purchaser_type_Loan was not originated or was not sold in calendar year covered by register</th>\n",
       "      <th>purchaser_type_Other type of purchaser</th>\n",
       "      <th>purchaser_type_Private securitization</th>\n",
       "      <th>hoepa_status_HOEPA loan</th>\n",
       "      <th>hoepa_status_Not a HOEPA loan</th>\n",
       "      <th>lien_status_Not applicable (purchased loans)</th>\n",
       "      <th>lien_status_Not secured by a lien</th>\n",
       "      <th>lien_status_Secured by a first lien</th>\n",
       "      <th>lien_status_Secured by a subordinate lien</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>310650</th>\n",
       "      <td>2016</td>\n",
       "      <td>1</td>\n",
       "      <td>110.0</td>\n",
       "      <td>119.0</td>\n",
       "      <td>55.0</td>\n",
       "      <td>5930.0</td>\n",
       "      <td>64100.0</td>\n",
       "      <td>98.81</td>\n",
       "      <td>1305.0</td>\n",
       "      <td>1631.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>630129</th>\n",
       "      <td>2016</td>\n",
       "      <td>1</td>\n",
       "      <td>480.0</td>\n",
       "      <td>33.0</td>\n",
       "      <td>270.0</td>\n",
       "      <td>4791.0</td>\n",
       "      <td>90300.0</td>\n",
       "      <td>144.06</td>\n",
       "      <td>1420.0</td>\n",
       "      <td>1450.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>715484</th>\n",
       "      <td>2016</td>\n",
       "      <td>2</td>\n",
       "      <td>240.0</td>\n",
       "      <td>59.0</td>\n",
       "      <td>96.0</td>\n",
       "      <td>3439.0</td>\n",
       "      <td>105700.0</td>\n",
       "      <td>104.62</td>\n",
       "      <td>853.0</td>\n",
       "      <td>1076.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>887708</th>\n",
       "      <td>2016</td>\n",
       "      <td>1</td>\n",
       "      <td>76.0</td>\n",
       "      <td>65.0</td>\n",
       "      <td>85.0</td>\n",
       "      <td>3952.0</td>\n",
       "      <td>61300.0</td>\n",
       "      <td>90.93</td>\n",
       "      <td>1272.0</td>\n",
       "      <td>1666.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>719598</th>\n",
       "      <td>2016</td>\n",
       "      <td>1</td>\n",
       "      <td>100.0</td>\n",
       "      <td>127.0</td>\n",
       "      <td>70.0</td>\n",
       "      <td>2422.0</td>\n",
       "      <td>46400.0</td>\n",
       "      <td>88.37</td>\n",
       "      <td>650.0</td>\n",
       "      <td>1006.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 44 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        as_of_year  occupancy  loan_amt_thousands  county_code  \\\n",
       "310650        2016          1               110.0        119.0   \n",
       "630129        2016          1               480.0         33.0   \n",
       "715484        2016          2               240.0         59.0   \n",
       "887708        2016          1                76.0         65.0   \n",
       "719598        2016          1               100.0        127.0   \n",
       "\n",
       "        applicant_income_thousands  population  ffiec_median_fam_income  \\\n",
       "310650                        55.0      5930.0                  64100.0   \n",
       "630129                       270.0      4791.0                  90300.0   \n",
       "715484                        96.0      3439.0                 105700.0   \n",
       "887708                        85.0      3952.0                  61300.0   \n",
       "719598                        70.0      2422.0                  46400.0   \n",
       "\n",
       "        tract_to_msa_income_pct  num_owner_occupied_units  \\\n",
       "310650                    98.81                    1305.0   \n",
       "630129                   144.06                    1420.0   \n",
       "715484                   104.62                     853.0   \n",
       "887708                    90.93                    1272.0   \n",
       "719598                    88.37                     650.0   \n",
       "\n",
       "        num_1_to_4_family_units  ...  \\\n",
       "310650                   1631.0  ...   \n",
       "630129                   1450.0  ...   \n",
       "715484                   1076.0  ...   \n",
       "887708                   1666.0  ...   \n",
       "719598                   1006.0  ...   \n",
       "\n",
       "        purchaser_type_Life insurance company, credit union, mortgage bank, or finance company  \\\n",
       "310650                                                  0                                        \n",
       "630129                                                  0                                        \n",
       "715484                                                  0                                        \n",
       "887708                                                  0                                        \n",
       "719598                                                  0                                        \n",
       "\n",
       "        purchaser_type_Loan was not originated or was not sold in calendar year covered by register  \\\n",
       "310650                                                  0                                             \n",
       "630129                                                  1                                             \n",
       "715484                                                  0                                             \n",
       "887708                                                  1                                             \n",
       "719598                                                  1                                             \n",
       "\n",
       "        purchaser_type_Other type of purchaser  \\\n",
       "310650                                       0   \n",
       "630129                                       0   \n",
       "715484                                       0   \n",
       "887708                                       0   \n",
       "719598                                       0   \n",
       "\n",
       "        purchaser_type_Private securitization  hoepa_status_HOEPA loan  \\\n",
       "310650                                      0                        0   \n",
       "630129                                      0                        0   \n",
       "715484                                      0                        0   \n",
       "887708                                      0                        0   \n",
       "719598                                      0                        0   \n",
       "\n",
       "        hoepa_status_Not a HOEPA loan  \\\n",
       "310650                              1   \n",
       "630129                              1   \n",
       "715484                              1   \n",
       "887708                              1   \n",
       "719598                              1   \n",
       "\n",
       "        lien_status_Not applicable (purchased loans)  \\\n",
       "310650                                             0   \n",
       "630129                                             0   \n",
       "715484                                             0   \n",
       "887708                                             0   \n",
       "719598                                             0   \n",
       "\n",
       "        lien_status_Not secured by a lien  \\\n",
       "310650                                  0   \n",
       "630129                                  0   \n",
       "715484                                  0   \n",
       "887708                                  0   \n",
       "719598                                  0   \n",
       "\n",
       "        lien_status_Secured by a first lien  \\\n",
       "310650                                    1   \n",
       "630129                                    1   \n",
       "715484                                    1   \n",
       "887708                                    0   \n",
       "719598                                    1   \n",
       "\n",
       "        lien_status_Secured by a subordinate lien  \n",
       "310650                                          0  \n",
       "630129                                          0  \n",
       "715484                                          0  \n",
       "887708                                          1  \n",
       "719598                                          0  \n",
       "\n",
       "[5 rows x 44 columns]"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "features = pd.read_pickle('data.pkl')\n",
    "features.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Int64Index: 999999 entries, 310650 to 875688\n",
      "Data columns (total 44 columns):\n",
      "as_of_year                                                                                     999999 non-null int16\n",
      "occupancy                                                                                      999999 non-null int8\n",
      "loan_amt_thousands                                                                             999999 non-null float64\n",
      "county_code                                                                                    999999 non-null float64\n",
      "applicant_income_thousands                                                                     999999 non-null float64\n",
      "population                                                                                     999999 non-null float64\n",
      "ffiec_median_fam_income                                                                        999999 non-null float64\n",
      "tract_to_msa_income_pct                                                                        999999 non-null float64\n",
      "num_owner_occupied_units                                                                       999999 non-null float64\n",
      "num_1_to_4_family_units                                                                        999999 non-null float64\n",
      "agency_code_Consumer Financial Protection Bureau (CFPB)                                        999999 non-null uint8\n",
      "agency_code_Department of Housing and Urban Development (HUD)                                  999999 non-null uint8\n",
      "agency_code_Federal Deposit Insurance Corporation (FDIC)                                       999999 non-null uint8\n",
      "agency_code_Federal Reserve System (FRS)                                                       999999 non-null uint8\n",
      "agency_code_National Credit Union Administration (NCUA)                                        999999 non-null uint8\n",
      "agency_code_Office of the Comptroller of the Currency (OCC)                                    999999 non-null uint8\n",
      "loan_type_Conventional (any loan other than FHA, VA, FSA, or RHS loans)                        999999 non-null uint8\n",
      "loan_type_FHA-insured (Federal Housing Administration)                                         999999 non-null uint8\n",
      "loan_type_FSA/RHS (Farm Service Agency or Rural Housing Service)                               999999 non-null uint8\n",
      "loan_type_VA-guaranteed (Veterans Administration)                                              999999 non-null uint8\n",
      "property_type_Manufactured housing                                                             999999 non-null uint8\n",
      "property_type_One to four-family (other than manufactured housing)                             999999 non-null uint8\n",
      "loan_purpose_Home improvement                                                                  999999 non-null uint8\n",
      "loan_purpose_Home purchase                                                                     999999 non-null uint8\n",
      "loan_purpose_Refinancing                                                                       999999 non-null uint8\n",
      "preapproval_Not applicable                                                                     999999 non-null uint8\n",
      "preapproval_Preapproval was not requested                                                      999999 non-null uint8\n",
      "preapproval_Preapproval was requested                                                          999999 non-null uint8\n",
      "purchaser_type_Affiliate institution                                                           999999 non-null uint8\n",
      "purchaser_type_Commercial bank, savings bank or savings association                            999999 non-null uint8\n",
      "purchaser_type_Fannie Mae (FNMA)                                                               999999 non-null uint8\n",
      "purchaser_type_Farmer Mac (FAMC)                                                               999999 non-null uint8\n",
      "purchaser_type_Freddie Mac (FHLMC)                                                             999999 non-null uint8\n",
      "purchaser_type_Ginnie Mae (GNMA)                                                               999999 non-null uint8\n",
      "purchaser_type_Life insurance company, credit union, mortgage bank, or finance company         999999 non-null uint8\n",
      "purchaser_type_Loan was not originated or was not sold in calendar year covered by register    999999 non-null uint8\n",
      "purchaser_type_Other type of purchaser                                                         999999 non-null uint8\n",
      "purchaser_type_Private securitization                                                          999999 non-null uint8\n",
      "hoepa_status_HOEPA loan                                                                        999999 non-null uint8\n",
      "hoepa_status_Not a HOEPA loan                                                                  999999 non-null uint8\n",
      "lien_status_Not applicable (purchased loans)                                                   999999 non-null uint8\n",
      "lien_status_Not secured by a lien                                                              999999 non-null uint8\n",
      "lien_status_Secured by a first lien                                                            999999 non-null uint8\n",
      "lien_status_Secured by a subordinate lien                                                      999999 non-null uint8\n",
      "dtypes: float64(8), int16(1), int8(1), uint8(34)\n",
      "memory usage: 104.0 MB\n"
     ]
    }
   ],
   "source": [
    "features.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load the test features and labels into numpy arrays"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Developing machine learning models in Python often requires the use of NumPy arrays.  Recall that NumPy, which stands for Numerical Python, is a library consisting of multidimensional array objects and a collection of routines for processing those arrays.  NumPy arrays are efficient data structures for working with data in Python, and machine learning models like those in the scikit-learn library, and deep learning models like those in the Keras library, expect input data in the format of NumPy arrays and make predictions in the format of NumPy arrays.  As such, it is common to need to save NumPy arrays to file.  Note that the data info reveals the following datatypes dtypes: float64(8), int16(1), int8(1), uint8(34) -- and no strings or \"objects\". So, let's now load the features and labels into numpy arrays.   "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "57KQ_XX2FEdl"
   },
   "outputs": [],
   "source": [
    "x_test = np.load('x_test.npy')\n",
    "y_test = np.load('y_test.npy')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's take a look at the contents of the 'x_test.npy' file.  You can see the \"array\" structure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[2.016e+03 1.000e+00 4.170e+02 ... 0.000e+00 1.000e+00 0.000e+00]\n",
      " [2.016e+03 1.000e+00 2.760e+02 ... 0.000e+00 1.000e+00 0.000e+00]\n",
      " [2.016e+03 1.000e+00 6.000e+01 ... 0.000e+00 1.000e+00 0.000e+00]\n",
      " ...\n",
      " [2.016e+03 1.000e+00 5.000e+02 ... 0.000e+00 0.000e+00 0.000e+00]\n",
      " [2.016e+03 1.000e+00 1.100e+02 ... 0.000e+00 1.000e+00 0.000e+00]\n",
      " [2.016e+03 1.000e+00 3.680e+02 ... 0.000e+00 1.000e+00 0.000e+00]]\n"
     ]
    }
   ],
   "source": [
    "print(x_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Combine the features and labels into one array for the What-if Tool\n",
    "\n",
    "Note that the numpy.hstack() function is used to stack the sequence of input arrays horizontally (i.e. column wise) to make a single array.  In the following example, the numpy matrix is reshaped into a vector using the reshape function with .reshape((-1, 1) to convert the array into a single column matrix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "hoFIrCQfFgvm"
   },
   "outputs": [],
   "source": [
    "test_examples = np.hstack((x_test,y_test.reshape(-1,1)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "-8xNn8EhgUi7"
   },
   "source": [
    "## Using the What-if Tool to interpret our model\n",
    "With our test examples ready, we can now connect our model to the What-if Tool using the `WitWidget`. To use the What-if Tool with Cloud AI Platform, we need to send it:\n",
    "* A Python list of our test features + ground truth labels\n",
    "* Optionally, the names of our columns\n",
    "* Our Cloud project, model, and version name (we've created a public one for you to play around with)\n",
    "\n",
    "See the next cell for some exploration ideas in the What-if Tool."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Create a What-if Tool visualization\n",
    "\n",
    "This prediction adjustment function is needed as this xgboost model's prediction returns just a score for the positive class of the binary classification, whereas the What-If Tool expects a list of scores for each class (in this case, both the negative class and the positive class).  \n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**NOTE:** The WIT may take a minute to load.  While it is loading, review the parameters that are defined in the next cell, BUT NOT RUN IT, it is simply for reference."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# ******** DO NOT RUN THIS CELL ********\n",
    "\n",
    "# TODO 1\n",
    "\n",
    "PROJECT_ID = 'YOUR_PROJECT_ID'\n",
    "MODEL_NAME = 'YOUR_MODEL_NAME'\n",
    "VERSION_NAME = 'YOUR_VERSION_NAME'\n",
    "TARGET_FEATURE = 'mortgage_status'\n",
    "LABEL_VOCAB = ['denied', 'approved']\n",
    "\n",
    "# TODO 1a\n",
    "\n",
    "config_builder = (WitConfigBuilder(test_examples.tolist(), features.columns.tolist() + ['mortgage_status'])\n",
    "  .set_ai_platform_model(PROJECT_ID, MODEL_NAME, VERSION_NAME, adjust_prediction=adjust_prediction)\n",
    "  .set_target_feature(TARGET_FEATURE)\n",
    "  .set_label_vocab(LABEL_VOCAB))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run this cell to load the WIT config builder.  **NOTE:** The WIT may take a minute to load"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "dqAbAmxkgW4p"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>.container { width:100% !important; }</style>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "654bf83aca0642c78308122d31fc000f",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "WitWidget(config={'use_aip': True, 'model_name': 'xgb_mortgage', 'uses_json_list': True, 'get_explanations': T…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# TODO 1b\n",
    "\n",
    "def adjust_prediction(pred):\n",
    "  return [1 - pred, pred]\n",
    "\n",
    "config_builder = (WitConfigBuilder(test_examples.tolist(), features.columns.tolist() + ['mortgage_status'])\n",
    "  .set_ai_platform_model('wit-caip-demos', 'xgb_mortgage', 'v1', adjust_prediction=adjust_prediction)\n",
    "  .set_target_feature('mortgage_status')\n",
    "  .set_label_vocab(['denied', 'approved']))\n",
    "WitWidget(config_builder, height=800)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "_B2BskDk55rk"
   },
   "source": [
    "## What-if Tool exploration using the XGBoost Model\n",
    "\n",
    "#### TODO 2\n",
    "\n",
    "* **Individual data points**: The default graph shows all data points from the test set, colored by their ground truth label (approved or denied)\n",
    "  * Try selecting data points close to the middle and tweaking some of their feature values. Then run inference again to see if the model prediction changes\n",
    "  * Select a data point and then move the \"Show nearest counterfactual datapoint\" slider to the right. This will highlight a data point with feature values closest to your original one, but with a different prediction\n",
    "\n",
    "####  TODO 2a\n",
    "\n",
    "* **Binning data**: Create separate graphs for individual features\n",
    "  * From the \"Binning - X axis\" dropdown, try selecting one of the agency codes, for example \"Department of Housing and Urban Development (HUD)\". This will create 2 separate graphs, one for loan applications from the HUD (graph labeled 1), and one for all other agencies (graph labeled 0). This shows us that loans from this agency are more likely to be denied\n",
    "\n",
    "####  TODO 2b\n",
    "\n",
    "* **Exploring overall performance**: Click on the \"Performance & Fairness\" tab to view overall performance statistics on the model's results on the provided dataset, including confusion matrices, PR curves, and ROC curves.\n",
    "   * Experiment with the threshold slider, raising and lowering the positive classification score the model needs to return before it decides to predict \"approved\" for the loan, and see how it changes accuracy, false positives, and false negatives.\n",
    "   * On the left side \"Slice by\" menu, select \"loan_purpose_Home purchase\". You'll now see performance on the two subsets of your data: the \"0\" slice shows when the loan is not for a home purchase, and the \"1\" slice is for when the loan is for a home purchase. Notice that the model's false positive rate is much higher on loans for home purchases. If you expand the rows to look at the confusion matrices, you can see that the model predicts \"approved\" more often for home purchase loans.\n",
    "   * You can use the optimization buttons on the left side to have the tool auto-select different positive classification thresholds for each slice in order to achieve different goals. If you select the \"Demographic parity\" button, then the two thresholds will be adjusted so that the model predicts \"approved\" for a similar percentage of applicants in both slices. What does this do to the accuracy, false positives and false negatives for each slice?\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright 2020 Google Inc.\n",
    "Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at\n",
    "http://www.apache.org/licenses/LICENSE-2.0\n",
    "Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "What-If Tool with XGBoost Cloud AI Platform Model",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
